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C/3 . Abstract 

o 

Data compression has been widely applied in many data processing ar- 
eas. Compression methods use variable-size codes with the shorter codes 
assigned to symbols or groups of symbols that appear in the data fre- 
quently. Fibonacci coding, as a representative of these codes, is used for 
compressing small numbers. Time consumption of a decompression algo- 
qq , rithm is not usually as important as the time of a compression algorithm. 

f"^ ■ However, efficiency of the decompression may be a critical issue in some 

cases. For example, a real-time compression of tree data structures fol- 
lows this issue. Tree's pages are decompressed during every reading from 
a secondary storage into the main memory. In this case, the efficiency of 
j a decompression algorithm is extremely important. We have developed 

a Fast Fibonacci decompression for this purpose. Our approach is up to 
3.5 x faster than the original implementation. 
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c3 ■ 1 Introduction 



Data compression has been widely applied in many data processing areas. Var- 
ious compression algorithms were developed for processing text documents, im- 
ages, video, etc. In particular, data compression is of foremost importance and 
has been quite well researched as it is presented in excellent surveys [H [12] . 

Various codes have been applied for data compression. In contrast to fixed- 
size codes, statistical methods use variable-size codes, with the shorter codes 
assigned to symbols or groups of symbols that have a higher probability of 
occurrence. Designers and implementors of variable-size codes have to deal with 
these two problems: (1) assigning codes that can be decoded unambiguously and 
(2) assigning codes with the minimum average size. 
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A prefix code is a variable-size code that satisfies the prefix attribute. The 
binary representation of the integers does not satisfy the prefix attribute. One 
disadvantage of this representation is that the size n of the set of integers has 
to be known in advance since it determines the code size as 1 + Ll°g2 n \ ■ I n 
some applications, a prefix code is required to code a set of integers whose 
size is not known in advance. Several codes such as Elias codes [Jj, Fibonacci 
codes Q], Golomb codes [6j [11] and Huffman codes [7] have been developed. 
Fibonacci coding is distinguished as a suitable coding for a compression of small 
numbers [5]. 

There are applications where asymmetric algorithms are applied. Let us 
consider a real-time compression of data structures [TU1 |2J. In this case, 
time consumption of a decompression algorithm is more important than the 
time of a compression algorithm. When a user query is evaluated, tree's pages 
are retrieved from a secondary storage and they are decompressed in the main 
memory. Consequently, a tree operation, like point or range query [3], works 
with the decompressed pages. Multidimensional data structures cluster similar 
tuples on a page [S]. When difference coding [5] is applied to tuple coordinates, 
small values are necessary to compress. Obviously, Fibonacci coding is suitable 
for the compression of such data. Since the page decompression is processed in 
real-time, the decompression algorithm must be as fast as possible. 

The original implementation of Fibonacci coding is not suitable for the 
real-time decompression. Therefore, we developed a fast implementation of 
Fibonacci coding to be described in this article. In Section [2] theoretical issues 
of the fast implementation are depicted. In Section [3j the Fast Fibonacci 
decompression is described. Since the decompression is more important than 
the compression in our case, we emphasize the decompression algorithm. 
Moreover, the original compression algorithm for Fibonacci coding is more 
efficient than the original decompression algorithm. In Section [4] experimental 
results are presented. Our implementation has 3.5 speedup factor. In the last 
Section, we conclude this paper and outline future works. 



2 Theoretical Issues of the Fast Fibonacci Cod- 
ing 

Fibonacci coding is based on Fibonacci numbers, and was defined by Apostolico 
and Fraenkel PQ. While the Fibonacci code is not asymptotically optimal, they 
perform well compared to the Elias codes as long as the number of source 
messages is not too large. The Fibonacci code has the additional attribute of 
robustness, which manifests itself by the local containment of errors. 

Every positive integer n has exactly one binary representation of the form 
n = '}2i=i a i^i where ai is either or 1, and Fi are the Fibonacci numbers 
1,2,3,5,8,13, Let us define F = 1 and Fi = for i < 0. This repre- 
sentation has an interesting property; the string a\a2 ■ ■ ■ does not contain any 
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Table 1: Examples of the Fibonacci code for small numbers 



n 


F(n) 


1 


11 


2 


Oil 


3 


0011 


4 


1011 


5 


00011 


6 


10011 


7 


01011 


8 


000011 



adjacent 1-bits. Fibonacci numbers can be used to construct a prefix code. We 
use the property that the Fibonacci representation of an integer does not have 
any adjacent 1-bits. If n is a positive integer, we construct its Fibonacci rep- 
resentation and append a 1-bit to the result. The Fibonacci representation of 
the integer 5 is 0001, consequently the Fibonacci-prefix code of 5 is 00011. It is 
obvious that each of these codes ends with two adjacent 1-bits, so they can be 
decoded uniquely. However, the property of not having adjacent 1-bits restricts 
the number of binary patterns available for such codes, so they are longer than 
the other codes. 

Formally, the Fibonacci code for n is defined as 

F(n) = ai<Z2 . . . ctpl 

The Fibonacci code is reversed and an 1-bit is appended. The Fibonacci 
code values for a small subset of the integers are displayed in Table Q] Value 
V(F(n)) of the Fibonacci code F(n) is defined as V(F(n)) = n. 

When the Fibonacci decompression is performed, the compressed memory 
is read bit by bit. Every bit stands for one number in the Fibonacci sequence. 
This number is added to the decompressed number if the bit is not 0. The 
addition stops when two 1-bits are in the sequence. This operation includes 
time consuming operations like retrieving the bit from the compressed memory. 
In Section [3j we introduce Fast Fibonacci decompression algorithm, which can 
be processed without retrieving every single bit from a compressed memory. 
This algorithm utilizes a novel operation - Fibonacci shift. 

Definition 1 (Value of extended Fibonacci code). 

Let a\a2 ■ ■ ■ dk * ak+icik+2 ■ ■ ■ «pl be an extended Fibonacci code of a Fibonacci 
code a\ai ■ ■ ■ a p l. Let us denote V as the value of extended Fibonacci code. We 
define V{a\a2 ■ ■ ■ au * o-k+io-k+2 ■ ■ ■ a p l) as 2<=i a iFi~-k, where Fi — for i < 

Definition 2 (Fibonacci shift). 

Let F(n) be the Fibonacci code for n. Let k > be an integer. Let F(n) <<f k 
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denote fc-th Fibonacci right shift as 

k 

F(n) « F k = 00. . .0aia 2 . . . a p l 
Fibonacci left shift is defined as 

F(n) »f k = aia 2 ...a,/,* a k+ ia k+2 • • • a p l 

It is easy to show that the Fibonacci shift is the Fibonacci code and 
F(n) « F = F(n) » F = F(n). 

For example, F(l) « F 2 = F(3), F(2) « F 3 = F(8) and 
F(6) » F 3 = F(1). 

Informally, we need to compute V(01011) based on V(1011). V(1011) = 
^ 3 + Fl V(01011) = F 4 + F 2 =F 3 + F 2 + F 1 +F Q = {F 3 + Fi) + (F 2 + F ) = 
V(W11) + V{1 * Oil). It means 1/(01011) = V(1011) + V(l * Oil). Formally, it 
can be written as F(4) « F 1 = F(4) + (F(4) >>_p 1). 

Theorem 1 (. 

the:fibShift Let F(n) be Fibonacci code for n. Then 

V(F(n) « F k) = F k x V(F(n)) + F k -i x (F(n) » F 1) 

Proof. 

This theorem can be proved by mathematical induction. First, we show that 
the statement holds when k = 0. 

V(F(n) « F 0) = F Q x V(F(n)) + F_i x V(F(ri) » F 1) 
= 1 x V(F(n)) + x V(F(n) » F 1) 
= V(F(n)) 

By induction Hypothesis, it is supposed that this theorem holds for all j, 
< j < k. We must prove that 

V(F(n) « F k)=F k x V(F(n)) + F k ^ x V(F(n) » F 1) 
Let be F(n) = a\a 2 ■ ■ ■ a p l then 
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p 

V(F(n)« F k) = J2a t F k+l 

i=l 

P P 

i=l i=l 
= V(F(n) << F fc - 1) + V(F(n) « F k - 2) 
= F fe _i x V(f (n)) + F k _ 2 x V(.F(ri) »f 1)+ 
+ F fc _ 2 x V(F(n)) + F fe _ 3 x » F 1) 

= F k _ x x V(F(n)) + F fc _ 2 x 

+ F fe _2 x V(F(n) » F 1) + F fc _ 3 x V(F(n) » F 1) 

= (F fc _! + F fc _ 2 ) x V(F(n)) + (F k _ 2 + F fe _ 3 ) x V(F{n) » F 1) 

= F k x y(F(n)) + F fc _x x V(F(n) » F 1) ■ 



3 The Fast Fibonacci Decompression Algorithm 

The proposed Fibonacci decompression method is based on a precomputed map- 
ping table. This table allows converting segments of compressed memory di- 
rectly into decompressed numbers. Segment of the size 1 byte has an advantage 
because it can be handled fast and it leads to a reasonable size of the mapping 
table. The length of the mapping table increases exponentially with the size of 
the segment. However, in Section 31 we show that the proposed approach can 
produce very good results even for small segment sizes like 1 byte. Consequently 
the exponential space complexity is not a problem. 

The first step in the proposed algorithm is to create a mapping table for 
a specified segment size. Let S denote the segment size. Every segment of a 
memory is a number, which points into a specified record in a mapping table. 
This means that a mapping table has to have 2 s records. 

One record contains the following information: 

• Count - count of numbers which are decompressed from a segment. The 
maximal value of the Count is half of S because every compressed number 
occupies at least two bits in compressed memory. 

• N umber s[Count] - the array holding the numbers, which are further pro- 
cessed in some cases processed or are just sent to the output as resulting 
decompressed numbers. 

• Shift - if the last number is not fully decompressed, the Shift value is 
the bit size of the last number, otherwise the Shift value is 0. Therefore, 
the Shift value is if the segment ends with two 1-bits. 

• EndWithZero - this flag is true if the segment ends with the 0-bit. 
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Segment with value 173 



Segment with value 165 



Compressed 10 11 
memory: I I I I 



10 1 



F(4) 



F(7) 



F(86) 



Figure 1 : Example of compressed memory where the second segment has to be 
searched in MAP2 (the least-significant bit is on the left side of the byte). 



• StartWithZero - this flag is true if the segment starts with the 0-bit. 

It is possible that the first 1-bit in a segment can complete the compressed 
number from a previous segment as it is shown in Figure [TJ Due to this fact, it 
is necessary to have two mapping tables. Let MAPI denote the first mapping 
table and MAP2 the second mapping table. When an z-th record is created 
in MAPI, the number i is the input for the record creation. The number i 
is normally decompressed bit by bit by the Fibonacci decompression and each 
number, which is decompressed is stored in the Numbers array. 

Odd- numbered records in MAPI are created similarly to records in MAPI; 
only the lowest bit of number i is omitted. Even-numbered records are the 
same as in MAPI. Therefore, it is possible to implement them as pointers to 
corresponding records in MAPI to save some space in the memory. 

Once the MAPI and MAPI are created, they can be used for the fast de- 
compression algorithm described in Algorithm[TJ The input compressed memory 
is represented here as an array of segments s. 

Exam/pie 1 . 

If we consider the example of compressed memory in Figure [TJ we will need to 
access the following records to be accessed in mapping tables: 

M API [173] = {2,(4, 7), 4, False, False} 

M AP2[165] = {1, (31), 7, False, False} 

Two numbers are obtained from the first record in the Numbers array. The 
first number 4 can be immediately stored in the result array and the second 
number is stored in the lastNumber variable, which holds the uncompleted 
number from the previous segment. We continue with the second record read 
from MAPI because the previous record ends with 1-bit. Since it starts with 
1-bit, it complete the number stored in lastNumber variable and this variable 
is stored in result. The number 31 is copied into lastNumber variable and 
number 7 into shift variable. Another segment starts with sequence Oil = F{2). 
Therefore, the Fibonacci shift is computed as V(F(2) <<f 7) = 55, and add 
the result to lastNumber. Afterwards, the lastNumber variable is stored in the 
result array because this number is completed in the third segment. 
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input : Array of segments s = s\, S2, ■ ■ ■ , Sfc 




output: Array of decompressed numbers result 




// Function F() and V() are defined in Section [2] 


1 


shift <- 0; 


2 


lastlMumber <— 0; 


3 


for j ' <— 1 to k do 


4 


if shift = or record. End WithZero then 


5 


record <- MAPl[sj]; 


6 


else 


7 


record <- MAP2[sj}; 


8 


if not record. StartWithZero then 


9 


result <— lastNumber ; 


10 


shift «- 0; 


11 


end 


12 


end 


13 


if shift = then 


14 


if record. Shift = then result <— result |j^° rdCount 




record. Numbers[i]; 


15 


else 


16 


result <— result Ui-° rd ° Unt 1 record. Numbers[i]; 


17 


lastNumber <— record. Numbers [record. Count]; 


18 


end 


19 


shift *— record. Shift ; 


20 


else 


21 


lastNumber <— lastNumber + V(F(record.Numbers[l] « F 




shift)); 


22 


if record. Shift = then 


23 


result *— result U lastNumber ; 


24 


shift 0; 


25 


i i i i rscord Count — 1 . _ . . r.-i 

result <— result (Ji-i ' record. N umbers |ij; 


Oft 




27 


if record. Count = 1 then shift <— shift + record. Shift ; 


28 


else 


29 


shift <— record. Shift ; 


30 


result <— result U lastNumber ; 


31 


result <— result y rec ° rd Count_:L record. Numbers [i]; 


32 


lastNumber = record. Numbers [record. Count ]; 


33 


end 


34 


end 


35 


end 


36 


end 



Algorithm 1: Fast Fibonacci Decompression Algorithm 
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4 Experimental Results 



The proposed Fast Fibonacci decompression has been tested and compared to 
the original algorithm. The algorithms' performance has been tested for various 
test collections. The tests were performed on a PC with dual core AMD Opteron 
1.8, 1GB RAM and a hard drive with 7200RPM using Windows Server 2003 
64bit. 

The test collections used in experiments have the same size: 2 22 = 4, 194, 304 
numbers. The proposed algorithm is universal and it may be applied for ar- 
bitrary numbers > 0. However, we worked with numbers < 4, 294, 967, 295, 
it means the maximal value is the value for the 32 bit-length binary number. 
Tested collections are as follows: 

• SEQ_ALL - a sequence of numbers from 1 to 4,194,304. 

• SEQ_VerySmall - a collection containing a sequence of very small numbers 
ranging from 1 to 255 (maximal value for the 8 bit- length number). 

• SEQ_Small - a collection containing a sequence of small numbers ranging 
from 256 to 65,535 (maximal value for the 16 bit-length number). 

• SEQ_Large - a collection containing a sequence of large numbers ranging 
from 65,536 to 16,777,215 (maximal value for the 24 bit-length number). 

• SEQ.VeryLarge - a collection containing a sequence of very large numbers 
ranging from 16,777,216 to 4,294,967,295 (maximal value for the 32 bit- 
length number). 

• RAND_ALL - a collection of random numbers ranging from 1 to 
4,294,967,295. 

• RAND_VerySmall - a collection of random numbers ranging from 1 to 255. 

• RAND_Small - a collection of random numbers ranging from 256 to 65,535. 

• RAND_Large - a collection of random numbers ranging from 65,536 to 
16,777,215. 

• RAND_VeryLarge - a collection of random numbers ranging from 
16,777,216 to 4,294,967,295. 

This section describes the obtained results of decompression algorithms for 
the collections. The first test was performed on sequential collections and its 
results are depicted in Tabled The Fast Fibonacci decompression algorithm is 
more than 3x faster than the original algorithm. 

The second test was performed on random collections. The experimental 
result is depicted in Table El Fast Fibonacci decompression algorithm achieves 
almost the same result for random numbers as for sequential numbers. 

Decoding efficiency for particular numbers was tested for a collection with 
2 22 numbers. In Figure [H we observe decoding times for values depicted as 
binary numbers with the exponent. Obviously, the fast algorithm is more than 
3.5 x faster than the original algorithm for each number. 
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Table 2: Fast Fibonacci decompression for sequential collections 





The Original Algorithm 


The Fast Algorithm 


Speedup 




[ms] 


[ms] 




SEQ_ALL 


943 


265 


3.56x 


SEQ.VcrySmall 


365 


109 


3.35x 


SEQ_Small 


687 


184 


3.73x 


SEQ_Large 


953 


265 


3.60x 


SEQ_VeryLarge 


1,109 


265 


4.18x 


Avg. 


811.4 


217.6 


3.73x 



Table 3: Fast Fibonacci decompression for random collections 





The Original Algorithm 


The Fast Algorithm 


Speedup 




[ms] 


[ms] 




RAND_ALL 


1,000 


297 


3.37 x 


RAND.VerySmall 


359 


109 


3.29 x 


RAND_Small 


784 


203 


3.62 x 


RAND .Large 


1,084 


318 


3.41 x 


RAND.VeryLarge 


1,390 


390 


3.56 x 


Avg. 


923.4 


263.4 


3.52 x 



• The Original Decoding 

o The Fast Fibonacci Decoding 
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Figure 2: Decoding efficiency for particular numbers 
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5 Conclusion 



In this paper, the fast decompression algorithm for the Fibonacci coding is intro- 
duced. There are applications where the decompression is more important than 
the compression. Moreover, the original compression algorithm for Fibonacci 
coding is more efficient than the original decompression algorithm. Therefore, 
this paper emphasizes the decompression algorithm. The novel operation - the 
Fibonacci shift - was introduced and it was applied for Fast Fibonacci decom- 
pression algorithm. The proposed implementation is up to 3.5 x faster than the 
original implementation. 
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